System and method for extraction of data from documents for subsequent processing

ABSTRACT

The present invention comprises an image based document processing and information management system and apparatus. It provides a more efficient method and apparatus for handling large volumes of form based business transactions using a digital image-based system for the capture, identification and processing of images, statistics and business data. The system converts documents, such as forms and supporting pages, into digital data which can be used to update computer records and to manage and support the adjudicative processing of business transactions by human operators at computer terminals.

BACKGROUND OF THE INVENTION

This invention relates to an image based document processing system andapparatus for converting paper documents into electronic data andelectronic images and managing the transactions initiated by thosedocuments using both the images and data extracted from the images. Thesystem manages document entry and flow within a business or otherorganization by allowing user interaction with the electronicallycaptured document.

In the processing of transaction documents by a large business orgovernmental agency, there is generally a need to accomplish at leastthree basic objectives. The first objective relates to the capture ofdata so that it can be electronically stored, for example, bytransmittal to a host computer system. This data may be pertinent toaccounts payable, insurance policy-holder records, mail order records,taxpayer records or other business information. Secondly, there is aneed to index and record the images of the documents from which thestored data was extracted for future retrieval and usage. Third, thereis a need to manage the transactions requiring human judgement initiatedby the documents and supply the captured data and image for use in theprocessing of a transaction, such as adjudicating an insurance claim orunderwriting a loan application in the usual course of business. Untilthe present invention, there has not been a satisfactory method orapparatus for automatically capturing, identifying, indexing, andrecording data and images from an incoming stream of documents ofintermixed sizes and formats for future interactive use.

Many companies employ manual sorting of documents, generally beginningwith the receipt of the documents in a mailroom. The disadvantagesinherent in such systems of document sorting are numerous. For example,sorting documents in the mailroom is labor intensive and costly. Manualsorting results in far greater error and document misidentification thanelectronic classification accomplished pursuant to the presentinvention.

Manual sorting of the contents of an envelope is presently accomplishedin several ways. Documents may be sorted by size, so that all documentswith the same physical dimensions and format, such as 1040 Tax Forms,are manually segregated and grouped. This grouping is necessary becauseprior automatic document processing devices cannot accommodate documentsof varying format. This pre-selection is necessary, using prior systems,to enable the software system to identify the data fields as they aregeographically located on the document page. Pre-selection is alsogenerally required in prior systems to accommodate paper feeding deviceswhich will not tolerate varying sizes and weights of input documents.

With the introduction of optical readers, some flexibility wasintroduced into the system by first labeling each document with a uniqueidentification which identifies the format of the document and allowsthe system to accommodate different forms without being separated intoindividual pre-sized groups. However, this system requires that thedocument format be preserialized, and many forms and documents existwithout a serialized identification. Thus, many documents are notreadable by this pre-serialized type of system. Accordingly, thecapability of processing many of the different sized and formatdocuments did not exist before the present invention.

Thus, even with pre-serialized systems, there has been a need in theindustry for a document processing system which accomplishes electronicidentification, delivery, storage, and retrieval of documents of varioussizes and types, without the need for a special ID code or otherserially printed mark to ascertain the identity of the document underobservation. Also needed is a system which may be adapted for existingtax forms and other documents without the necessity of changing thesestandardized tax forms or other documents to include pre-printed marksor numbers.

U.S. Pat. No. 4,205,780 (the "'780 patent") relates to a documentprocessing system with a video camera and television monitor. Thetypical document transport as described in the '780 patent has thecapability of reading magnetic ink character recognition (MICR) data orOCR data encoded on the documents being processed, recording the data,and sorting the documents in a predetermined manner, but requires thatthe documents be sorted by bank employees before they are loaded intothe document scanner and that all the document formats conform. "Header"and "trailer" cards function to separate each batch of documents. Headercards contain MICR data that identify the account being processed.

The present invention differs from the '780 patent disclosure. Forinstance, the '780 patent describes the use of MICR and OCR machinereadable characters in processing check transactions. Stylizedcharacters and special fonts are used, pursuant to the '780 patent, toallow machine recognition of forms and remittance documents which arepre-printed and manufactured, thus avoiding the automatic identificationof other forms which are not specially pre-printed. However, the presentinvention uses the ability to automatically identify documents that arenot pre-printed to keep envelope contents separate, processing thecontents of the envelopes as a transaction.

The '780 patent requires that human operators routinely enter data byhand keying the data on a keyboard. The present invention, however,allows for data capture without operator keying in many applications.This results because the present system will locate and extract datafrom existing forms after identifying the forms. In cases where data ismachine readable, no operator keying is necessary for most data with thepresent invention.

SUMMARY OF THE INVENTION

In contrast to the prior data capture systems described, the documentimaging system of the present invention can capture an optical image ofnumerous intermixed documents of different sizes and formats, takendirectly from opened mail, serially number them before or after imagecapture, automatically separate the checks, identify the form ordocument under observation, and manipulate the image data in anadvantageous and required manner. After identification of the document,the present system can electronically carve and read specific datafields automatically. Human operators key correct the data for automaticauditing in a manner that is much more efficient than previous dataentry systems. The invention permits less skilled operators tospecialize, thereby requiring less training of these relativelyunskilled operators.

One advantage of the present system over previously known data capturesystems is that known systems generally require that documents receivedin a mailroom or other data collection center be sorted into homogenousgroups. For example, a clerk in a company mailroom using previouslyexisting systems is required to sort, batch, and count various businessforms and paper materials prior to sending these materials to an opticalscanning device or data entry department.

Accordingly, the present invention provides a novel method and apparatusthat overcomes the limitations and disadvantages of the prior art. Thepresent invention also speeds up the process of document processing sothat a higher volume of transactions can be processed by allowingmultiple transports and terminals to operate in parallel on the samecommunications network. Further, the present invention reduces thenumber of errors which were heretofore considered to be inherent in adocument processing operation.

A significant advantage of the present system for document retrieval andstorage is that the data keying operator or adjudicator need not bedirectly involved with aspects such as the time of receipt or thelocation of a particular document or data field. In the preferredembodiment, incoming mail is extracted from envelopes in a mailroom andthe pieces of mail are immediately scanned on an optical scannerserially, with no particular vertical orientation, i.e., with theirwording right side up or upside down. The documents are sequentiallynumbered with a numbering device as they pass through the opticalscanner, and are separated on a stacker separator to separate selectedpages or checks from document forms and other pages. The same sequencenumber is electronically assigned to the captured image of the document,and to the data extracted from the document. This allows for subsequenthard copy retrieval for rescanning in cases in which an image isillegible, or for other evidentiary reasons. In these cases it ispossible for an operator to retrieve the original hard copy based uponthe partial image and the item sequence number. Thus, the adjudicatormay return to the original hard copy if necessary.

As a document page proceeds through the scanner, a digital picture ofthe page is taken. This image is captured at a resolution ranging from150 to 400 pels.

The next step relates to the identification of the document which nowresides in the system as a unique captured electronic image or graphicsscreen. The software in the present system carves a previouslyascertained identification area from the document. In order that thismay be accomplished, a number of identification areas are chosen inadvance by a designer from the existing printed forms using aninteractive computer display. The co-ordinates of identification areasare stored in the computer and accessed for carving the identificationarea from the documents, thus identifying the document. Forms need notbe redesigned to enable automatic identification.

After document identification is complete, graphical data areas arecarved for recognition and correction if necessary. The electronic imageof the document, compressed if desired, is sent over a local areanetwork and stored on magnetic disk for ready access to other portionsof the system. The extracted and audited data is sent to a host computerfor processing.

The present invention may be advantageously adapted to existing taxforms and other documents because a designer may choose an existingidentification area or word for identifying the document. This advantageis not available with previously known systems.

The chosen identification area presently existing on the document isused as a geographical reference point for the entire spatial image ofthe document. All other graphical data areas on the document are carvedin reference to the identification area located by the processingsystem.

In choosing the identification area, the designer preselects a word oran area already existing on the form. The geographic location of thearea and the spelling of a word, or the electronic signature of the pelpattern of the area are used as identifying criteria when appropriate.The intelligent character reader (ICR) in the system may examine thepre-chosen identification areas to interpret a word on the document, andidentify the document. For instance, on a 1040 tax form, the wordidentifier might be the word "exemptions" as in FIG. 6.

An additional advantage of the present optical scanning system inventionis that a document may be inserted into the scanner either inverted orproperly justified. The software accommodates and automatically invertsupside down identification fields and their related document images toproperly read data in either position. When a dual sided scanner isused, documents may be inserted either face up or face down.

The identified carved portion of the graphics screen image of thedocument is used to spatially reference graphical data areas of thegraphics screen for conversion to usable character data. The firstgraphical data area carved from the image is known as the "predictorfield." The predictor field has special significance. The predictorfield is used to determine whether the image residing within a graphicaldata area was originally printed manually or by a machine. Becauserecognition techniques differ for these two different types of images,the initial determination of whether original data appearing as an imagein the graphical data area was machine printed or manually written isused throughout to determine which type of processing each graphicaldata area will receive. If the desired data to be captured from thedocument is machine printed, the system will determine the pitch of theletters. This information is used in all subsequent carving andinterpretation of the remaining graphical data areas on the document.

An additional advantage of the image processing subsystem of the presentsystem is that it can logically deduce the amount of skew in thedocument image. Perfect alignment is seldom possible, therefore,document skew is normally present in the data fields due to variationsin the exact spatial coordinates of documents which are mechanicallyinserted and fed into the optical scanning device. Once the amount ofskew is determined, the program can deduce the proper adjustment forcarving and extracting data from the data fields.

A further significant advantage of the present invention is itsadjudicative ability. The efficiency of data correction/entry clerks isgreatly increased by the present system. The present system reduces theadjudicative or decision making functions required of data entrypersonnel because they key only what they see and then only from adiscriminate graphical data area.

High productivity rates are thus possible with the present system. Forinstance, data corrector/entry operators may increase their keystrokerate productivity to levels as high as 20,000 keystrokes per hour.Unlike previously known systems, it is not necessary that personnel beaware of the type of document under observation or the relationshipamong specific fields from which data is extracted. It is only necessarythat data corrector/entry personnel key information as it is rapidly andcontinuously placed in front of them. These operators may specialize incertain fields such as, for instance, only numerical fields, such associal security number fields, or only alphanumeric fields such as nameand address fields. This specialization advantageously reduces error andcosts associated with operator training. This allows, for example, anoperator with only a numerical key pad to process only numerical fields,allowing for high input rate and decreased cost.

The present invention obviates the need to manually deliver documents tovarious departments or groups within an organization. This resultsbecause the documents are converted to electronic data and images, andare automatically available to host computers and to data/documentmanipulation personnel through interface work stations.

The present invention, in its preferred embodiment, uses a wandmechanism to separate different transactions. The contents of eachenvelope is termed a "transaction". Separator cards are not requiredwith the present invention to separate each transaction. This is asignificant advantage because separator cards waste valuable computermemory and time that might otherwise be available for data manipulationor storage. More significantly, the use of a wand allows for thecontents of an envelope to be processed sequentially together, thusallowing for transaction integrity. Wands may also be used to separatebatches of transactions.

The present invention allows for automatic character recognition ofmachine printed characters, such as typewritten, upon existing forms,such as 1040 tax forms. These forms are "read" for a determination ofthe data residing thereon without changing the forms to accommodate thesystem. Special stylized characters and special pre-printed fonts areadvantageously not required with the present invention.

Another significant advantage of the present invention is that datakeying errors may be reduced by re-routing or circulating the data in alogical error reduction sequence. For example, an audit in the systemtests data for accuracy once it has been read and keyed by an operator.The auditing system uses a second operator who keys the same data keyedby a first operator. The system then compares the two data fields todetermine whether or not they are identical. When the data fields arenot identical and the keyed data has failed the audit, the system uses athird operator. In that case, one of the data fields previously keyedwill be routed to the third adjudicative operator for keying. This thirddata field, when matched with one of the first two data fields, proves,with almost complete accuracy, which of the original two data fields waskeyed incorrectly. The adjudicative subroutine then chooses the"matched" data in preference to the incorrect data. Operators areusually unaware whether they are keying the first, second or thirditeration. Thus, keystroke errors may be essentially eliminated with thepresent invention.

The present invention also provides a method of evaluating the accuracyof data key operators. A statistical summary of operator errors andoperator performance may be retained in memory for future use inevaluating data operators.

One step in the process of the present system is the recording of theimage of each document in a digital medium as a graphics image. Thisrecording of the image may be on magnetic or optical disks, microfilm, 8millimeter magnetic or optical tape, or some other suitable digital datarecordation and storage means. In many applications, such as theinsurance industry, data must be stored, and yet available, for extendedperiods of time. An ongoing file may require that the data be retrievedand operated on many times before a file is closed.

Once data is recorded and properly indexed by keys and addresses, forexample within the data base, the system or an optional host computermanages work flow and directs the routing of transactions and associateddocument images to specific operators. In this manner, an unlimitednumber of adjudicators and employees may access, correct, or view thedata simultaneously in an interactive mode. This system of documentavailability to a number of operators greatly increases the efficiencyof a business organization.

BRIEF DESCRIPTION OF THE DRAWINGS

The advantages and technical advances of the preferred embodiment of thepresent invention can be discerned by reference to the followingdrawings and schematic diagrams:

FIG. 1 depicts a typical block diagram of the configuration of thepreferred embodiment of the document capture, processing, storage, andretrieval system of the present invention;

FIG. 1(a) represents a more detailed block diagram of the imageacquisition subsystem shown in FIG. 1 which shows the paper path anddigital image production;

FIG. 1(b) is an alternative block diagram of the preferred embodimentwhereby separate "nodes" cooperate within the local area network;

FIG. 1(c) depicts the block diagram of the application subsystem systemresponsible for managing data and image flow as individual graphics intoand out of storage;

FIG. 2 is a perspective view of the hardware configuration for thetypical image acquisition subsystem shown in block diagram in FIG. 1(a)constituting document feeding, printing, scanning, separating, andstacking devices;

FIG. 3 is a block diagram depicting the sequence of steps in thepreferred embodiment of the present invention;

FIG. 4 represents, as an illustration, the application of this inventionto a typical hand written social security number field which has beenelectronically carved from a document as it appears on a terminal,displayed as a graphical data area;

FIG. 5 depicts a hand written alpha-numeric field in the form of a nameand address field as it appears, displayed on a terminal screen as agraphical data area;

FIG. 6 shows an operator selecting the word identifier "exemption" on atypical federal tax form in another application of the preferredembodiment of this invention;

FIG. 7(a) depicts a single character reject as it appears displayed on acorrection terminal screen as a graphical data area and a characterstring with a universal character in the form of an "*" replacing therejected character;

FIG. 7(b) depicts multiple separate character rejects as they appeardisplayed on a correction terminal screen as graphical data areas andcorresponding character strings, each having a universal character inthe form of an "*" replacing the rejected character;

FIG. 8 is a side view of the preferred automatic feeder apparatus of thedocument capture system according to the present invention; and

FIG. 9 is a top view of the preferred feeder shown in FIG. 8.

DESCRIPTION OF THE PREFERRED EMBODIMENT

One embodiment of the present invention is illustrated as a blockdiagram in FIG. 1 as incorporated into a document processing systemorganized around a local area network (LAN) which may be Novell. Thelocal area network, indicated as block 36, forms the central componentof the document processing and information management system 38. Allsubsystems including the image acquisition subsystem, indicated as block24, communicate with the local area network 36. The image acquisitionsubsystem 24 conveys the captured image from an input device to thelocal area network 36. The input of documents may occur through amicrofilm or microfiche scanner 30, paper scanner 32, or facsimiledevice 28.

Each subsystem contains modules of software which can function in asingle dedicated computer, or which can be shared with other computersubsystems. The image acquisition subsystem 24, the capture managementsubsystem indicated as block 10, the application subsystem indicated asblock 12, the data correction/entry workstation 14, the applicationsupport workstation 16, the storage management subsystem indicated asblock 20 and the intelligent character reader (ICR) indicated by block34 can all share a single computer environment in a very low volume useof the invention.

FIG. 1(b) shows an alternate arrangement for the document processingsystem for high volume applications of the present invention. A controlcomputer 180, 184, 188, 194, 198, 204, 210 is provided to each subsystemfor data processing. Image transformer 186 receives digital data viatelecommunications from a facsimile or other remote source, indicated asblock 187 by suitable means such as a serial port. Image transformer 186converts the digital data to provide a graphics image of each documentreceived from the remote location to the LAN network via controlcomputer 184. A second image transformer 190 receives digital data bysuitable means such as a serial port from an optical scanner 192, andprovides a graphics image of each document that is compressed oruncompressed to the local area network via control computer 188. Animage character reader 182 provides ASCII data to the local area network36. Storage 196 in the form of hard or optical discs receives and storesgraphics images and character data from the LAN 36. An optional hostcomputer 202, which may be an IBM, manages batches of transactions inconjunction with batch communication control 200. Interactivecommunication, e.g. IBM 3278, between control computer 204 and hostcomputer 202 provides graphics output from a third image transformer 206directly to host computer 202. Terminal 212 receives a graphics screenimage and, when necessary for correction purposes, character stringdata. Terminal 212 provides operator input to LAN 36 through controlcomputer 210.

The Image Acquisition Subsystem

The image acquisition subsystem 24 is responsible for the transfer of animage compressed or uncompressed, in the form of a raster graphics imageof each document, suitable for entry onto the local area network 36. Inorder that the graphics image may be produced, a signal is firstreceived from a scanner or a facsimile device as illustrated by block129 in FIG. 1(a).

Referring to FIG. 1, scanner 129 may be a microfilm or microfichescanner as indicated by block 30, a document scanner as indicated byblock 32, and a facsimile device as indicated by block 28. The facsimiledevice 28 may be a Gamafax® facsimile emulation device for use with anIBM PC AT compatible, with a modem, functioning at 9600 band rate. Themicrofilm or microfiche scanner 30 may be a Mekel® M400 available fromMekel Engineering in Walnut, Calif. The scanner 129, however, may notconstitute all of these devices, or may constitute other suitable inputdevices.

The scanner 129 is connected via suitable means such as a serial port toan image transfer system 138 in the form of a Kofax Image ProductsSeries 8200 board, preferably an 8204, located in an expansion slot of acontrol computer 140 that processes the signal from the scanner 28, 30,32. Preferably, the control computer 140 is an IBM AT compatiblecomputer with an 80386 mother board, a hard disk drive, two megabytes ofexpanded memory, a floppy disk and a Proteon LAN interface card mountedin one of its other slots. Additionally, an optional keyboard anddisplay, with associated graphic support, may be connected to controlcomputer 140 for user interaction with a captured image.

Where a document scanning device, as indicated by block 32, is used, thepreferred device is a Fujitsu America scanner model 3090E(a) opticalscanner for one-sided forms. A TDC scanner may be used for two-sidedforms, although other suitable devices can also be substituted. Anoptical scanner 60 is shown in FIG. 2. A Digital Image Systems Company(or "DI") model 401 document separator 62 (see FIG. 2) receivesdocuments from the document scanner 60.

Regardless of what type of scanner 129 is used, which may evenconstitute a signal from a conventional group 3 or group 4 facsimiledevice over local or remote telephone lines, the signal is in the formof a serial digital bit stream. In applications requiring morevoluminous documents processing, a separate control computer may beprovided for each subsystem as seen in FIG. 1(b), each computercommunicating with local area network 36.

The image transformer 138, FIG. 1a, receives the signal, processes thesignal into usable data and compresses the data into a first outputimage in the form of a unique graphics image of each of the documentsprocessed by the scanner 129. The output from the image transformer 138is passed to the local area network 36. Alternatively, an optionalmagnetic disk storage or other type of mass storage medium may receivegraphics screen image data from control computer 140.

The left half of FIG. 1(a) depicts the paper path 130 in those cases inwhich the document received is paper. In order for each document to beconverted into a usable signal that is receivable by the imagetransformer 138, each document must be fed one at a time to the documentscanner, illustrated in Figure 1 as block 32 and shown in FIG. 2 at 60.In order that this may be best accomplished, the present inventionprovides an auto feeder 1000, as shown in FIGS. 8 and 9, and describedin detail below.

Generally, envelopes received in the mailroom contain multiple sheets ofpaper and sometimes a check. In the past, the contents of the envelopeswere sorted by size by a clerk so all documents of the same size weremanually counted, segregated and grouped. The contents of the envelopescould even be sorted into stacks in which each stack contains documentsof the same format, such as 1040 tax forms. In addition, each documentin a stack was required to be oriented in the same direction. But, suchsorting is extremely time-consuming and costly.

However, the present invention is able to process documents ofintermixed sizes and format without the necessity of each documentformat being identified by a unique identification stamped on eachdocument. In order to process documents of intermixed sizes and format,the present invention provides an automatic feeder 1000 usable withdifferent sized sheets of paper and checks, commonly termed documentshereafter, which are stacked together. The contents of each envelope arenow kept together as a transaction and fed consecutively through thesystem first, for scanning, and afterwards, for archival serializationwithout being separated.

In the past, documents of the same format had to be processed together,thus forming a type of batch, identified by the common format of eachdocument. Each batch was separated by "header" and "trailer" cards.Commonly, these cards contained magnetic ink character recognition data(MICR) that identified the common format of the documents of each batch.These cards flowed along through the system with the documents, takingup processing time and were scanned, serialized, and stored along withthe images of the documents by the system, wasting valuable computertime and memory. More importantly, transaction integrity was brokenbecause different document types from the same transaction were placedin different batches. Accordingly, expensive error prone procedures wererequired to maintain the relationship of the separated documents duringprocessing and subsequent retrieval.

However, because the present invention processes documents which are ofa different size and format, a transaction may constitute the contentsof each unique envelope. This ability to handle documents of intermixedsizes is the strength of the system. Thus, not only may those documentswhich have a common format be processed together as a transaction, thepresent invention allows the contents of an envelope, which mayconstitute different size and format sheets of paper and a check, to beprocessed together. Even "white mail" which may contain address changes,complaints, instructions, etc., may now be processed in the sametransaction and associated with an individual or account by virtue ofbeing included in the same transaction. Because this new capabilityexists, transaction integrity can be maintained. The individualdocuments associated with each transaction can be kept associated,without allowing the images and data from one individual or account tobe mixed with that of another.

In order to maintain this integrity, these transactions also requireseparation, but "header" and "trailer" cards are not required with thepresent invention. Instead, the present invention implements twodifferent means of virtual separation, as described hereafter, which donot physically accompany the documents being processed by the system,but which determines the end of each transaction, allowing the presentinvention to electronically maintain the integrity of each transaction.This is a significant advantage because clerical labor and errors areavoided, and valuable computer time and memory are now freed, allowingthe computer to process more transactions during a comparable timeframe.

One preferred way to accomplish virtual separation is by a wandmechanism as described hereafter. Another preferred way to accomplishvirtual separation is by placing a unique document, recognizable by thesystem, first in each transaction. With the second preferred way ofaccomplishing virtual separation, the operator places the uniquedocument, i.e. a 1040 form of a tax return, first on the feeder. Whenthe system encounters a second unique document, the beginning of anothertransaction is signaled, and the documents between the previous uniquedocument and the second unique document, including the previous uniquedocuments but not including the second unique document, are identifiedas belonging to the first transaction. When the control computer 140receives the signal, logical separation is established by suitablemethods such as a unique data key to locate the database addresses ofthe graphics images of each of the documents, and subsequent characterdata interpreted from graphical data areas of each graphics image. Whena unique document is not present, the operator uses the wand mechanismas described hereafter to establish virtual separation.

FIG. 8 represents an expanded detailed view of the feeder mechanism ofthe present invention encompassing the wand mechanism, as showngenerally in the upper right portion of FIG. 2 (See document feeder 64in FIG. 2). Referring to FIG. 8, the preferred feeder 1000 usable forfeeding a stack of documents 1002 of intermixed size and format, andusable for detecting the boundary of each transaction included in thestack of documents 1002 is shown. The feeder 1000 includes a base 1004on which the stack 1002 is located, and means associated with base 1004for feeding a single document 1006 from the stack of documents 1002while leaving the remaining documents stacked on base 1004. The base1004 has an elongated, wedge shape, and includes a planar face 1008 overwhich the stack of documents 1002 is located that is oriented at anangle to the horizontal with its front end positioned at its nadir orlower-most point. Each document is fed forward, but downwardly acrossplanar face 1008 of base 1004 to an optical scanning device 60 shown inphantom and also shown in the document feeder 64 in FIG. 2, as indicatedby arrow 1009 that receives the document passing each document betweenroller 1011.

A pair of vertically extending retainers 1013 may be provided to keepthe documents in the processing path, thus preventing the documents frommoving sideways and off the feeder 1000. Accordingly, planar face 1008is provided with slots 1001 in which the retainers 1013 are located. Theretainers 1013 can be adjusted apart as necessary via slots 1001 andtightened into position by thumbscrews 1003.

The means for feeding a single document includes a pick drive means inthe form of a pair of spaced, longitudinal pick rollers 1010 located infront of the base 1004 that drive the lowermost document of the stack1002 forward from the stack 1002 and feed the document while aligningthe next document of the stack 1002 to be fed. The pick rollers 1010 arepositioned laterally across the front end of planar face 1008, which islocated at the front end of base 1004, with their axes lying along aplane parallel to the planar face 1008 of base 1004. The pick rollers1010 lie laterally across the forward path of the documents, and turncounterclockwise as indicated in the figure at 1012 in a directionsuitable for driving the lowermost document of the stack forward, asindicated by arrow 1014. Preferably, but not shown in the figures, eachpick roller 1010 includes an elongated rod 1015, shown partially in FIG.9, and a number of narrow roller members spaced along the length of therod that are attached thereto for rotation therewith. In order to rotateeach pick roller 1010 and its roller members, each rod includes meanssuch as sprockets or drive belts attached thereto for rotation thereofsuch that both pick rollers 1010 rotate counterclockwise at the sametime.

Located above pick rollers 1010 is a weighted bar 1019. Weighted bar1019 includes a series of narrow rollers 1021 positioned above rollers1017. Weighted bar 1019 is slidably located in a slot 1023 in rachet1025 to move freely in a vertical direction and, when present, aids inthe initial flattening of documents prior to being fed and processed.

The means for feeding a single document also includes a means in theform of a longitudinal back scrubber roller 1016 adjacent the pickrollers 1010 for allowing only one document to leave the stack 1002 at atime. Back scrubber roller 1016 lies positioned parallel and above theforwardmost pick roller 1018 distal the front end of base 1004. Backscrubber roller 1016 lies longitudinally along the backside of pickroller 1018, biased against its backside. Back scrubber roller 1016 hasa diameter less than the diameter of pick rollers 1010, and rotatescounterclockwise as indicated at 1020. Referring to FIG. 9, backscrubber roller 1016 includes an elongated rod 1022, shown in phantom,and a number of narrow roller members 1024, shown in phantom, spacedalong the length of rod 1022 that are attached thereto for rotationtherewith. In order to rotate back scrubber roller 1016, the rod 1022includes means such as sprockets or drive belts attached thereto forrotation thereof. Located above back scrubber roller 1016 is a cover1027, pivotally attached to base 1004 for access to roller 1016.

Referring to FIG. 8, both pick rollers 1010 are rotatively mounted on aconnecting web 1026 that is mounted to pivot about an eccentric roller1028, mounted adjacent to the bottom of base 1004. Eccentric roller 1028provides a biasing means for urging pick roller 1018 against backscrubber roller 1016 that is adjustable by rotating eccentric roller1028. A tension spring 1030 attached to the rear portion of web 1026extends downwardly to attach to the bottom 1032 of base 1004. Tensionspring 1030 provides a second biasing means for biasing pick roller 1018against back scrubber roller 1016, as more pressure is needed forthicker paper. The diameter of back scrubber roller 1016, the rate atwhich roller 1016 counter-rotates relative to roller 1018 while pivotingabout eccentric roller 1016 and the tensile force of spring 1030 aresuch that pick roller 1018 is urged against back scrubber roller 1016with a suitable force so that only one document at a time may leave thestack 1002 and pass between rollers 1016 and 1018.

The means for feeding a single document also includes a continuous drivemeans in the form of a pair of elongated belt drives 1034 located alongbase 1004 for urging the stack 1002 adjacent the pick rollers 1010. Thebelt drives 1034 are positioned parallel, but spaced along the base 1004below the stack of documents 1002 with their longitudinal portionsparallel to planar face 1008 of base 1004. The upper longitudinalportion of the belt drives 1034 engage the bottom of the stack ofdocuments 1002, and the belt drives 1034 turn in a counterclockwisedirection as indicated by arrow 1036, suitable for urging the stack 1002forward over the pick rollers 1010 to a position in which all, or thelower portion of the stack 1002 contacts along the backside 1038 of theback scrubber roller 1016.

Each belt drive 1034 includes a continuous belt 1040 which is locatedaround a pair of drive wheels 1042 and rotates therewith. Each pair ofdrive wheels 1042 is connected to a suitable drive means so that alldrive wheels 1042 rotate simultaneously.

The means for feeding a single document includes a detection meansadjacent the pick rollers 1010 that is engaged by the movement of thelower portion of the stack 1002 adjacent the pick rollers 1010 foractivating the pick rollers 1010. Referring to FIG. 8, the detectionmeans is in the form of a longitudinal detector web 1044 that is biasedagainst the front side of the forwardmost pick roller 1018, and extendsoutwardly and forwardly from the front side of the forwardmost pickroller 1018 below its apex. Detector web 1044 includes laterallyintersecting elongated portions. Extending downwardly from web 1044 is aswitch portion 1046 engageable with a complementary switch portion 1048located below on base 1004 by the downward motion of web 1044 responsiveto the forward flow of a document over web 1044, which indicates thepresence of a document flowing forward between rollers 1016 and 1018.The engagement of switch portions 1046, 1048 providing a signal thatactivates the pick rollers 1010.

Extending downward from the rear of web 1044 is a bridge 1047 whichintersects with an arm 1049. Arm 1049 is pivotally attached at its freeend 1050 to the bottom of base 1004. Extending upwardly from the bottomof base 1004 adjacent the pivotal attachment of arm 1049 and base 1004is an elongated support 1052. A suitable tension spring 1054 attached toarm 1049 between its free end 1050 and bridge 1047 is attached tosupport 1052 near its upper end and acts to bias arm 1049 upward, thusurging longitudinal web 1044 upwards.

Two preferred ways are employed by the present invention to establishvirtual separation. One preferred way to accomplish virtual separationis by the use of a unique document in each transaction as discussedabove. Another preferred way to accomplish virtual separation is by thewand mechanism. In order to accomplish virtual separation in this way,an isolating means in the form of an elongated wand 1056 is located inthe stack of documents to isolate the documents of a transaction belowthe wand from the documents and transactions above the wand. Referringto FIG. 8, the stack of documents 1002 constitutes four sets oftransactions 1058a, 1058b, 1058c, 1058d, each transaction beingseparated from its overlying transaction by an elongated wand 1056a,1056b, 1056c.

A guide means in the form of a pair of parallel, but spaced elongatedguides 1060 extends laterally upward adjacent the rear of the base 1004.Guides 1060 include laterally extending, spaced, adjustable longitudinalsupport rods 1062 that are carried in unseen slots in base 1004. Supportrods 1062 are adapted for longitudinal motion within the unseen slotsparalleling planar face 1008, and may be adjusted for moving guides 1060forwards or backwards to accommodate different size paper. Accordingly,different lengths of wand 1056 may be provided for use in the guides asdescribed in the following to best intermix different sizes ofdocuments.

Referring to FIG. 9, an elongated slot 1064 extends inwardly along thelongitudinal length of each inner, opposing face of elongated guides1060. Each slot 1064 opposes the other to form a larger guide slot 1066which extends longitudinally between the upper and lower ends of guides1060.

Referring to FIG. 8, each wand 1056 is provided with a cylindricalbushing 1068 which surrounds one of its ends and which is slidablelongitudinally within guide slot 1066 in a downwards direction asindicated by arrow 1070. Base 1004 includes an elongated slot 1072 whichextends forwardly from the rear of base 1004 and connects with guideslot 1066. Slot 1072 is somewhat wider than the width of each wand 1056so that each wand can pass easily therethrough. The elongated horizontallength of slot 1072 is long enough so that each wand 1056 when locatedwith its bushing 1068 in guide slot 1066 can pass easily therethrough. AV-shaped neck 72, shown in FIG. 2 aids in quick entry of the brushings1068 into slot 1066.

Again referring to FIG. 8, three wands 1056a, 1056b, 1056c are shown.Each wand 1056a, 1056b, 1056c is positioned with its bushing 1068lengthwise in the guide slot 1066, and located with its free end withinthe stack of documents 1002 isolating transactions 1058a, 1058b, 1058cand 1058d from each other. Each wand 1056a, 1056b, 1056c extends overslot 1064 and each wand moves downward as the documents are fed forwardone at a time, each wand falling consecutively through slot 1064 whenthe last document of the transaction located below that wand has beenfed forward.

A means for detecting the separation or falling of each wand 1056 inorder to determine the end of that transaction is included in the formof a sensor 1074, preferably photoelectric, attached to base 1004adjacent slot 1064 and connected to the control computer 140 via aserial port or other means for communication therewith. An arm of 1075,the sensor 1074 is triggered by the falling of a wand 1056 through theslot 1072, and the sensor 1074, when triggered, communicates with thecontrol computer 140 and signals the end of that transaction.

The signal received by the control computer 140 from sensor 1074 allowsthe control computer to establish virtual separation and isolate thedocuments of each transaction from the documents of another transaction,maintaining the integrity of the transaction. Thus, for example as wand1056a falls when the last document of transaction 1058a is passedforward, sensor 1074 signals the end of transaction 1058a, whichindicates the beginning of transaction 1058b. Likewise, as wand 1056bfalls when the last document of transaction 1058b is passed forward,sensor 1074 signals the end of transaction 1058b, which indicates thebeginning of transaction 1058c. Likewise, as wand 1056c falls when thelast document of transaction 1058c is passed forward, sensor 1074signals the end of transaction 1058c, which indicates the beginning oftransaction 1058d. At the end of transaction 1058d, an additional wand1056 may be included, or the operator may manually trip sensor 1074,signaling the end of that transaction.

Consequently, a rearwand extending narrow vertical slot 1071 connectingwith slot 1066 through which the rear portion 1073 of each wand passes,has a land 1075 at its bottom against which rear portion 1073 engagesbefore falling. Land 1075 advantageously prevents the rear portion ofthe wands 1056 from prematurely falling and activating the sensor 1074before the transaction has ended. The falling of the front portion ofthe wand at the end of the transaction, causing the wand to tilt withinslot 1066, thus allowing the rear portion 1073 of the wand to bypassland 1075 and fall. Accordingly, the horizontal length of slot 1066 issufficiently long for bushing 1068 to tilt.

The falling of a wand 1056 as it leaves the stack of documents andseparates from the continued processing flow of the documents, givesrise to a signal being generated from sensor 1074 to the controlcomputer 140. This signal, generated by the falling of a wand 1056,gives the system new capabilities, allowing the control computer 140 toestablish virtual separation without using control cards or when nounique document is present, and maintain the integrity of eachtransaction.

This provides the system with at least two important new capabilities.First, by merely drawing up one of the documents of the transaction,such as a billing statement, any accompanying "white mail" may bequickly referenced as a graphics screen by referencing the transactionidentifier generated in the control computer 140 on its reception of thevirtual separation signal. This transaction identifier may, for example,constitute a unique serialized number that serves as a data key tolocate the database addresses of the graphics screen for each documentof the transaction. This allows quick access to "white mail" and otherdocuments included in the transaction that otherwise might become lostin the mail room or other processing area and disassociated from theaccount or individual.

Second, the system is allowed to establish virtual separation withoutrequiring batch intensive labor, overhead computational time and storagespace as required in prior systems where control cards are processed,scanned, and stored along with individual documents. Of course, whereall the documents being processed constitute a single batch, instead ofusing a wand 1056 to signal an end of the batch, the end may be manuallytriggered by the operator using, switch 1077 on the outside of the base1004. Operation of the switch 1077 signals an end to the batch. Further,there may be other types of transactions for which manual triggering tosignal the of the transaction is preferable, and this is within thescope of the present invention also.

After each document is scanned, each document (as seen in paper pathroute 130 in Figure la) proceeds to the sequence number printer 132. AHewlett Packard model 402 audit trail printer, for printingidentification numbers on documents before or after scanning, is apreferred device for printing identification numbers on documents. Morepreferably, a Hewlett Packard Think-Jet printer may be employed forprinting identification numbers on documents. The Think-Jet has amovable printhead, and receives information from a sensor identifyingthe edge of each document. The information from the sensor is processedand passed to the printer to position the printhead so that theidentification is printed in the margin of the document. Preferably, theidentification is printed in the left margin, and the position for theprinthead is determined from an analysis of the scanned image to assurethat the printed number is always in the left margin.

After a sequence number is printed on the paper copy of the document,the paper proceeds to a paper stacker selector 134. The paper stackerselector 134 selectively separates checks from other documents and feedsthe checks into the paper stacker assembly 136. Checks may be identifiedby the presence of ferrous based ink, or because they are less than fourinches high. In this way, checks are advantageously separated and madeavailable for deposit at a bank. The document separator 62 (depicted inFIG. 2) comprises the sequence number printer 132, the paper stackerselecter 134, and the paper stacker assembly 36, as depicted in FIG.1(a).

The Capture Management Subsystem

FIG. 1(b) shows a more detailed diagram of the capture managementsubsystem 10 and its intelligent character reader (ICR) 34. The ICR 34(FIG. 1), in the preferred embodiment, is an IRIS image recognitionintegrated systems character reader, available from Image RecognitionIntegrated Systems Company in Belgium.

Referring to FIG. 1, the output signals representing an electronicraster graphics image of each document, uncompressed or compressed byconventional CCITT Group IV run length algorithms or the like to savespace, are provided to the local area network by the image acquisitionsubsystem 24. The electronic images are provided from the local areanetwork to the capture management subsystem 10. It is within the capturemanagement subsystem that the output signals are used to identify thedocuments from which those output signals have emanated. The intelligentcharacter reader 34 performs this function by reference to anidentification area or an identification word found on the documentitself.

The capture management subsystem 10 is assembled on a VME backplane. Thecapture management subsystem 10 includes, in the preferred embodiment,one or more Force® model 80386 computers which may be obtained fromForce Computers, Inc. in Campbell, Calif., one or more Iris® brand imagecharacter readers, one or more separate Ciprico® Rimfire 3400 brand disccontrollers available from Ciprico Company and a Hitachi hard diskdrive. Of course, similar equipment could be substituted in practicingthe present invention.

The Storage Management System

The storage management subsystem 20 receives and stores signalsrepresenting both the electronic images and signals representing datafields within a document. These electronic images and data fields areprovided to the control computer 158 from the local area network 36 (seeFIG. 1(c)). The control computer has an interface to an optical diskstorage, not shown in the Figures.

Preferably, storage management subsystem 20 is assembled from a PC/ATcompatible computer using an 80386 board, Novell Network software twomegabytes of expanded memory, and a Western Digital Fasst® 7000 modeldisc controller with two 300 megabyte model DK514E38 hard magnetic discstorage devices.

Preferably, the optical disk storage includes a Sony optical disccontroller, two Sony® 450 megabyte worm optical disc drives and aCygnet® model 5000 jukebox with 19 slots for optical disc media. Ajukebox is a robotic device that selects optical media from an array ofslots and inserts optical media into an optical drive. As anotheroption, Sony® model SMO-S501 erasable optical discs with 325 megabytesper side may be used. However, other comparable components could besubstituted.

The Data Correction and Application Support Workstations

There may be several data correction and entry workstations indicated asblock 14 in FIG. 1. Each of these are preferably an IBM compatible 80286or 80386 computer, but other comparable equipment should be substituted.Each of these computers are equipped with a Hercules® 720×348, VGA640×480, EVGA 800×600 compatible, for example, or other high resolutiondisplay such as a Sigma L-Vue.

Preferably, each workstation 14 is provided with a MicroSoft Windows orequivalent graphical operating system for controlling the displayoutput. The workstation 14 receives the full raster binary image fromthe LAN system, and when the image is received by the workstation 14,the graphics image is given to the Windows operating system for display.

Each workstation 14 also contains a Proteon LAN interface card, andinterfaces with the local area network 36. Accordingly, each workstation14 communicates with other control computers on the LAN.

Preferably, the application support workstation indicated in FIG. 1 asblock 16 is an IBM compatible 80386 AT equipped with a hard disk drive,a high resolution Sigma L-Vue display, decompression hardware orsoftware, and a Fujistu floppy disk drive. The application supportworkstation 16 connects the local area network 36 to an optional hostcomputer 46. The link to a host computer 46 provides the capabilitythrough which human operators may access graphics images resident on thehost computer, and change the processed data. Data flows in or out ofthe local area network 36 from the application support workstation 16.

The Publication Subsystem

The publication subsystem 18 provides an outlet from the local areanetwork 36 for producing hard copies 44, micrographics 42, or facsimileoutput of stored information or images.

Publication subsystem 18 is preferably assembled from IBM compatible PCAT model 286 computer equipped with a Proteon LAN interface board.Kofax® compression/decompression boards and output devices suitable tothe application are included. If a hard copy output 44 is desired, aHewlett Packard Laser Jet Series II laser printer may be used. Ifmicrographics 42 are desired as an output from the publication subsystem18, a model IBase® film recorder may be used. A facsimile device mayprovide data transmission with a Gamafax® modem 40 at a 9600 baud rateto a direct dial phone line.

The subsystems portrayed in FIG. 1 are typical of the preferredembodiment of the present invention. Of course, alternative devices orsubsystems functionally equivalent to those depicted in FIG. 1 could besubstituted. Additionally, functionally equivalent hardware can besubstituted for the specific hardware components recited in the presentspecification.

Application Subsystem

The local area network 36 communicates with an application subsystemindicated as block 12 in FIG. 1, also known as an application processor.The application subsystem 12 preferably consists of an IBM CompatiblePersonal Computer, which is available from Systex in Carrollton, Tex. Inthe preferred embodiment, application subsystem 12 also includes an80386 board, a Proteon model no. P1303 LAN interface card, a WesternDigital® brand hard disc controller, a 300 megabyte hard disk and twomegabytes of extended memory. An IBM compatible 3370 data communicationslink may connect the application subsystem 12 to a host computer.

Referring to FIG. 1, the application subsystem 12, which may exist in aseparate computer, or may share a computer with other subsystems,controls image and data traffic from the storage management subsystem 20to the application support workstation 16 as the adjudication processensues after data capture. The application subsystem 12 supervises theextraction of data and transmission of data to the host computer 48.

Processing a Transaction

FIG. 3 portrays the typical flow of a transaction in the preferredembodiment of the invention. In step 80 intermixed sizes of sheets andpaper and checks, commonly termed documents, are processed as previouslydescribed in reference to the image acquisition subsystem in FIG. 1(a).The auto feeder 1000 feeds each document sequentially, one at a timefrom the bottom of the stack 1002, and introduces it to the opticalscanner 60 for data capture.

In step 82 the scanning device captures an image of the paper. Dependingon the type of scanner used, such captured image may be at a variety ofresolutions from 150 pels per linear inch to 400 pels per linear inch,or even more. The scanner produces a digital bit stream representationof the image that is transmitted to image transformer 138 via a serialport in computer 140.

The raster graphics image produced by the transformer 138 of eachdocument are saved in their uncompressed form for the subsequentidentification process and for later other usage. The graphics image ofeach document is also compressed using the standard CCITT algorithms.Each graphics image is passed to LAN 36 and received therefrom by thecapture management subsystem 10. Both the graphics images and the carvedgraphical data areas are sent to the storage subsystem for storage onmagnetic disc. Within the capture management subsystem 10, the imagecharacter reader 34 processes each graphics image against pre-selectedidentification areas for a match. Using the matched identification areafor that graphics image, graphical data areas of each graphics image arespatially referenced from the matching graphical representation of thematched identification area and carved therefrom.

Step 84, together with steps 80, 88 and 90 permit the automaticprocessing of intermixed forms, checks, and documents. The integrity ofeach separate transaction must be maintained. That is, all the documentsrelating to a single transaction must be logically "held together" bythe system. Referring to FIG. 2, the document feeder 64 provides a wand68 as previously described which is usable for this purpose. The wandand related switch 70, as previously described and which is attached tothe automatic feeder 1000, permit the definition of virtual transactionboundaries without consuming scanner throughout time. The falling of thewand causing switch 70 to generate a signal that is received by controlcomputer 140. Upon the reception of the signal, computer 140 associates,for example, a unique data key with the preceding documents of thetransaction. The data key serves to locate addresses of the graphicsscreen of each document of the transaction for subsequent retrieval fromstorage. Thus, transaction integrity is obtained, allowing for easyelectronic retrieval of all documents of the transaction, including"white mail". Transaction integrity must be maintained; that is, onemust, for example, keep the images and data from policyholder A's claim(doctor's report, drug store receipt, etc.) logically associated withA's claim, and not intermixed with policyholder B's claim.

The preferred way to stack each transaction is to load the primary orprincipal transaction document first. For example, in a tax formprocessing transaction, the Form 1040 would be loaded first. A speciallydesigned separator document may be used to indicate separation within atransaction. This manual separation may be necessary when a tax return,for example, arrives without a Form 1040 as part of the transaction.

Preferential placement of a unique document as previously disclosed,manual use of the wand switch 70, or insertion of a unique formattedseparator document, may also be used to effect transaction integrity.

In step 86, a unique serialized number is printed on all the documents.Each unique number may include batch identification, transactionidentification, transport identification, operator identification, date,time, or other identifying variables. That unique number may also beassigned to the image, stored on magnetic disc, and will be assigned tothe data to be subsequently extracted from the image. The unique numberserves as the primary key or locator number for the original document.

In step 88, identification and separation of checks from other documentsoccurs. The preferred embodiment of the invention identifies checks bymeasuring the dimensions of the image of the check or sensing thepresence of the ferrous-based ink printed in the line of characters onthe bottom of the check. One or both of these two check-identifyingcriteria is used to physically separate the check from the othertransaction documents by sending the check to a separate output pocketso that checks may later be deposited at a bank.

Steps 80 through 88 are performed within the image acquisition subsystem24 as shown in FIG. 1. In step 90, forms and pages are automaticallyidentified by the image character reader 34 when the system has beeninstructed or programmed to anticipate them. That is, when the locationand characteristics of the identification area for each page have beendefined. This is accomplished by sequentially comparing each of thepre-chosen identification areas in the capture management subsystem 10to selected, spatially referenced geographic areas of the graphicsscreen image of the document.

Step 90 permits intermixing different forms, and even permits theintermixing of forms in the properly justified or inverted orientations.Graphics screen images of documents that are not identifiedautomatically by the image character reader 34, are queued for manualidentification by human operators looking sequentially at a screen orqueue of images and keying in the identification. "White mail" (itemsincluded in customer transaction envelopes that are not expected orrequired for a particular business transaction) will appear in thisqueue, and be bypassed by the operator for subsequent data manipulationwhile being retained as a graphics screen image for other subsequentusage.

There are two different ways which may be used to automatically identifyforms in step 90. One of the ways is to use the image recognition system34 to recognize a graphical representation of the identifying word ofthe form. Each form has a specific geographic identification areacontaining a specific identification word or words. In order toaccomplish this, a pre-chosen series of identifiers is processed againstportions of the graphics image. If a match is found, its location withinthe graphics image is identified as the geographic identification area.Once the geographic identification area is found and identified, othergraphical data areas can be located a vector distance measured in binarybits from the identification area.

One of the pre-chosen identifier may be the word "exemptions" as itappears in the upper right margin of the IRS Form 1040. Another may bethe word "averagable" as it appears in Schedule G of the IRS Form taxreturn in the upper middle of the form.

However, and referring to FIG. 6, in the event a graphics screen imageis undecipherable due, for example, to extraneous matter on the originalfrom such as accidental ink marks, spilt coffee, etc., the unidentifiedimage is queued for an operator(s) who interprets the image andidentifies the image to the system as a 1040 Tax Form. All or part of adigital image of 1040 Form 126 may be queued and available for operatorreview in this event. This process is indicated in FIG. 3 by block 92.Step 92 further relates to manual identification of obsolete forms. Inthis case, the graphics screen will be available for interpretation, butcarved graphical data areas will not be.

An alternative means to automatically identify forms pursuant to step 90is by using the ICR to analyze the same geographic template area of eachform and evaluating the distribution of black pel's on the white field.The so-called "signature" of this area is unique to each form.Histograms and intersection counts are two criteria used in the presentsystem. Such counts and transformations hereof are accumulated by theICR for each horizontal and/or vertical line of pel's within thetemplate area. An advantage of this technique is that only onegeographic template area need be developed from each form.

Once a particular form is identified, the relative location of theidentification word, or "signature" within the carved identificationarea, is used to adjust the carving of the data fields to beaccomplished in step 94. This process adjusts for misregistration andskew that is introduced as part of the form printing process or thescanning process, whether it be from film, paper or facsimile.

The automatic carving in step 94 can be used to exploit existing widelyused forms without burdening the automatic identification feature toidentify forms that occur very infrequently. In less advanced systemswithout an intelligent recognition subsystem 34 to accomplish step 90,step 92 may be accomplished concurrently with step 96 relating to keycorrection and entry if the entire page image is displayed to anoperator.

Step 94 relates to the carving of graphical data areas as described inthe preceding, once the form has been identified. In step 90, thelocation and identity of the identification area was identified. Oncethe location and identity of the identification area has beenestablished, other graphical data areas can be located vector distances,measured in binary bits, from the identification area. The identifiedgraphical data areas are now carved or extracted from the graphic image.These carved graphical data areas are sent to the recognition sub-system34. Step 94 is optional in the sense that systems may exist and functionwithout it. However, without step 94, all data must be keyed.

Only the area immediately surrounding each graphical images of data tobe captured are carved and sent to the recognition subsystem 34. Thefirst such carved graphical data area read is designated the "predictorfield," and it is used to distinguish between hand print and machineprint data since different character recognition techniques may be usedfor each. The predictor field can also be used to determine pitch formachine printed characters.

With or without step 94, step 96 is essential because character readersare imperfect and sometimes the output of the character reader must becorrected. If the system does not include one or more character readers,then all data must be keyed after carving.

In the event that the predictor field indicates the carved graphicaldata areas are undecipherable as may occur when the entries of adocument were handwritten, each of the carved graphical data areas, suchas shown in FIGS. 4 and 5 may be sent to a different clerk forinterpretation and keying. This permits individual clerks to specialize.For example, one group of clerks may specialize in keying only numericfields, which need only a numeric keypad, and another group mayspecialize in keying alpha numeric fields which require the use of thefull keyboard at the data correction/entry workstation 14. Suchspecialization improves operator productivity by permitting them tobuild a rhythm. FIG. 4 shows a typical handwritten numeric socialsecurity field 120. FIG. 5 shows a typical handwritten alpha-numericname and address field 122.

Specialization may even extend to a specific field. For example, thesocial security number for tax return processing as seen in FIG. 4 is afixed-length numeric field and does not require the stroking of theenter key at the workstation 14 to indicate that the keying of the fieldis complete.

Operators may specialize in correcting individual characters which havenot been recognized. In processing each carved graphical data area, thegraphical data area is converted by the image character reader 34 to arepresentative ASCII character string by processing individual graphicalportions of the data area and converting each individual graphicalportion to an ASCII character which is positioned in the string at alocation associated with the location of the graphical portion fromwhich it was derived. But, the image character reader 34 may not be ableto convert all portions of the graphical data areas to an ASCIIcharacter. This may happen for several reasons, including imperfectlyformed numbers or letter on the original document or accidental markssuch as ink stains that partially or totally obliterate numbers orcharacters on the original document character, which may be an "*" orother suitable character such as a "?", in the character string. Theuniversal character is intermixed in the character string and locatedwithin the character string at a position associated with the locationof the unconvertable portions in the graphical data area.

The system recognizes that the conversion was partially imperfect.Consequently, both the graphical data area and its representativecharacter string are routed to a correction terminal 14 where both thegraphical data area and the representative character string aredisplayed. The display in FIG. 7a shows one such graphical data area 216and its representative character string 217 containing a universalcharacter. FIG. 7b illustrates a display which has a number of graphicaldata areas 218, 219, 220, 221, each graphical data area having arepresentative character string 218a, 219a, 220a, 221a containing auniversal character associated with it.

Referring to FIG. 7a for clarity, the operator examines the graphicaldata area 216 and interprets the portion of the graphical data areaassociated with location of the universal character in character string217. The operator then keys in a character using the keyboard which isrepresentative of the unconverted portion of the graphical data. In FIG.7a, this is an "8". The input is read by the computer, manipulated andpositioned within the character string at the position associated withthe universal character, thus replacing the universal character.

The corrected character string may now be displayed, replacing theoriginal character string containing the universal character. Thecorrected character string is now data keyed, or addressed by thecontrol computer 140 and stored for subsequent use. A number ofgraphical data areas and their associative character string may bedisplayed simultaneously as shown in FIG. 7b to reduce the number ofrequired keystrokes and increase the speed of data entry. A prompt,which may be the position of the cursor on the screen alerts theoperator to which graphical data area is being examined and interpreted.Of course, the original graphics image from which the graphical dataareas were carved need not be altered, leaving the original graphicsimage for subsequent retrieval.

Step 98 relates to using a computer program to audit data that has beeneither read by the character reader or keyed to determine its accuracy.An example of such auditing includes the use of check digit routines andtesting for ranges. Audits are done automatically and failures areautomatically recycled through step 96 using the pertinent queues. Whendifferences in keying occur between operator A and operator B atdifferent data correction/entry substations 14, or in the recognitionsubsystem 34, then fields or windows can be cycled a third time throughstep 96 to break the tie. As such, activity by a skilled operator atstep 100 may be eliminated, resulting in substantial increases inefficiency and decreases in operating costs.

Step 100 is the first step requiring a skilled human operator. Prior tostep 100, all roles for clerical activities may be accomplished by verylow-skilled clerks requiring little training. The low-skilled clerkssimply key what they see or perform simple manual operations.

During step 100, clerks solve problems. For instance, if a zip code doesnot match a city, clerks determine which of the two is correct, andsubsequently change the wrong element. Clerks may delete incompletetransactions or create two or three separate transactions from one pieceof paper. In any event, human judgment is exercised in step 100.

Step 102 relates to the storing of the image. The graphics image, allcarved graphical data areas and interpretive character data that hasbeen extracted from the graphics image and all the operator and machinestatistics generated in the process are written into storage in thestorage management subsystem 20. For example, the identity of theoperator who ran the document scanner 32 and the identity of theoperator who keyed the social security numeric field 120 may be writteninto the storage management subsystem 20. This type of instructiveinformation is coded in the data stored on the mass storage media.

Step 104 relates to the "address" or location of the image in storage. Avariety of data elements are cross-referenced, one to another, andassembled in a database. A database can be as simple as a "flat file,"random or can be constructed upon some relational database model.

Step 106 relates to extracting data from the database and sending it toa host computer system 48. The data sent may be taxpayer records,policyholder records or some other business records. The data is sentusing any one of a variety of traditional software protocols. Data issent to the host computer system 48 via the application subsystem 12(see FIGS. 1 and 3).

Step 108 relates to computer-based rules applied to data to classify thetransaction or to automatically process the transaction. For example, itmay be most efficient to pay a simple insurance claim if it is within amodel of an acceptable claim, rather than take the time for a humanprofessional to spend time judging the merits of the claim.Additionally, it may be useful to classify insurance claims so thatadjudicators can specialize, negating the requirement that humanadjudicators be trained to understand all nuances of each type of claim.For example, an insurance adjudicator may specialize in thoracicinjuries. By applying such rules relating to a specific type of claim,transactions may be categorized and queued for suitable adjudicatorswith like skills and with more intensive specific training in oneparticular area.

Step 110 relates to organization of queues which may have beenestablished during step 108, or in the absence of step 108, by a humansupervisor. In any event, work is scheduled from one professional taskto another. Tasks are monitored by the system so that they arecontinually queued for the next workstation 14. The system automaticallyfollows up when work is not passed to the next step in a workflowprocess before a particular pre-set deadline. Hard copy or facsimileoutput is selectively permitted at output step 111.

Step 112 relates to the support of customer inquiries received by mailor phone. In processing customer inquiries, graphics images areretrieved and presented to a clerk who services those inquiries upondemand. The retrieval may be based on the item sequence number printedand assigned in step 86, if that number has been transmitted to the hostcomputer. On the other hand, retrieval may be based upon any one of theindices or keys selected during step 104 or other conventional methodscommon to database application.

Although the best modes contemplated for carrying out the presentinvention have been herein shown and described, it will be apparent thatmodification and variation may be made without departing from what isregarded to be the subject matter of the invention. For example,although digital apparatus is disclosed, appropriate analog equivalentsmay also be employed to form an analog combination suitable forpracticing the present invention. It will be apparent to one skilled inthe art that relatively simple software instruction may be devised tocarry out the exact sequence of steps described in the presentspecification.

What is claimed is:
 1. A method of electronically processing data fromone or more documents to facilitate user interaction with the data,comprising:a) generating a plurality of first output signalsrepresenting electronic images of said documents, said output signalsbeing generated by feeding a series of documents associated together asa transaction sequentially through an optical scanning device wherebytransaction integrity is maintained; b) generating an electric signal atthe beginning and end of the transaction for separating the images ofone transaction from those of another; c) identifying at least one ofthe documents by reference to identification areas or identificationwords found on the documents, the identification occurring by referenceto geographical location or pel pattern techniques; d) extracting datafields from at least one of the documents by generating a plurality ofsecond output signals representing said data fields; e) storing saidfirst and second output signals for subsequent processing; and f)managing the processing of transactions to support adjudicationprocesses and customer inquiries.
 2. A method of electronicallyprocessing data to facilitate user interaction with the data,comprising:(a) feeding documents through an optical scanning device; (b)recording electronic images of documents; (c) identifying documentformats and transaction boundaries using identification areas oridentification words; (d) extracting data fields from identifieddocument images using automatic character recognition techniques and keycorrection; (e) recording electronic data; and (f) transmitting recordedimages and data to digital storage for subsequent processing.
 3. Themethod of claim 2 further comprising the identification of documentformats by automatic recognition techniques.
 4. The method of claim 2further comprising the identification of document formats by pel patterntechniques.
 5. The method of claim 2 further comprising feedingdocuments of varying size through said optical scanning device.
 6. Themethod of claim 2 further comprising feeding intermixed documents ofvarying format through said optical scanning device.
 7. The method ofclaim 2 further comprising feeding documents through said opticalscanning device in either a properly justified or inverted orientation.8. The method of claim 2 further comprising the step of printing itemsequence numbers upon said documents as said documents pass through saidoptical scanning device.
 9. The method of claim 2 further comprising thestep of correcting for document skew resulting from documentmisalignment during scanning.
 10. The method of claim 2 furthercomprising extracting data fields from document images using a predictorfield to determine whether machine printed or pen printed data residesin the data field.
 11. The method of claim 2 further comprising the stepof circulating data in a logical error reduction sequence to reducekeying errors.
 12. The method of claim 2 further comprising therecording of statistical summaries of data key operator errors forevaluation of operator performance.
 13. A method of electronicallyprocessing data from documents to electronic images to facilitate userinteraction with the data, comprising:(a) feeding documents through anoptical scanning device, said documents proceeding through said opticalscanning device in either a properly justified or inverted orientation,said documents being of varying size and different formats; (b) printingitem sequence numbers upon said documents as said documents pass throughsaid optical scanning device; (c) recording electronic images ofdocuments; (d) identifying document formats using identification areasor identification words, the identification of document formatsoccurring by reference to geographic location; (e) extracting datafields from identified document images using automatic characterrecognition techniques and key correction; (f) correcting for documentskew resulting from document misalignment during scanning, (g) recordingelectronic data; and (h) transmitting recorded images and data todigital storage for subsequent processing.
 14. The method of claim 13further comprising extracting data fields from document images using apredictor field to determine whether machine or pen printed data residesin the data field.
 15. The method of claim 13 further comprising thestep of circulating data in a logical error reduction sequence to reducekeying errors.
 16. The method of claim 13 further comprising therecording of statistical summaries of data key operator errors forevaluation of operator performance.
 17. An optical disc-basedtransaction processing system for performing business transactions byuser interaction with electronically stored data comprising:(a) a localarea network for managing interaction of separate components of saidtransaction processing system; (b) an image acquisition subsystemoperatively connected to said local area network, said image acquisitionsubsystem providing digital image data input to said local area network,said image acquisition subsystem coordinating the capture and transferof electronic images; (c) a capture management subsystem operativelyconnected to said local area network, said capture management subsystemfunctioning to carve data fields from digital document images; (d) anapplication subsystem operatively connected to said local area network,said security and control subsystem operating to direct transactionprocessing events in sequence; (e) application support workstation foruser interaction with said local area network; and (f) a storagemanagement subsystem for storing electronic digital data, said storeddata being available for user interaction.
 18. A system for opticallycapturing, storing, and retrieving data used electronic images, whichsystem comprises:(a) means for optically scanning intermixed documentsof various size and different formats; (b) means to separate checks fromother pages; (c) means for recording electronic images of documents tofacilitate document identification using identification areas oridentification words, said means for identification of document formatsoccurring by automatic recognition techniques; (d) means for extractingdata fields from said electronic images using automatic characterrecognition techniques and key correction to record electronic data; (e)means to record electronic data extracted from said data fields; (f)means for transmitting said recorded electronic data to a host computer;(g) means for selectively retrieving data as necessary in performingbusiness transactions; and (h) means for indexing and cross-referencingstored data and electronic images.
 19. The apparatus of claim 18 whereinthe means for optically scanning documents includes means for feedingdocuments through said optical scanning means in either properlyjustified or inverted orientation.
 20. The apparatus of claim 18 whereinthe means for optically scanning documents further includes means forprinting item sequence numbers upon said documents as said documentspass through said optical scanning means.
 21. The apparatus of claim 18wherein the means for optically scanning documents further includesmeans for correcting document skew resulting from document misalignmentduring scanning.
 22. The apparatus of claim 18 wherein the means fromextracting data fields further includes means for circulating orre-routing data in a logical error reduction sequence to reduce keyingerrors.
 23. The apparatus of claim 18 wherein the means for opticallyscanning documents includes a wand and related switch mechanismassociated with the document feeder to permit the definition of atransaction boundary.
 24. A combination for optically capturing, storingand retrieving data using electronic images, which system comprises:(a)means for optically scanning intermixed documents of various size anddifferent formats to maintain transaction integrity in the processing ofbusiness transactions; (b) means to separate checks from other pages;(c) means for printing item sequence numbers upon said documents at ornear the time when said documents pass through said optical scanningdevice; (d) recording electronic images of documents; (e) identifyingdocument formats using identification areas or identification words, theidentification of document formats occurring by automatic recognitiontechniques; (f) extracting data fields from identified document imagesusing automatic character recognition techniques and key correction; (g)correcting for document skew resulting from document misalignment duringscanning; (h) recording electronic data; and (i) transmitting recordedimages and data to digital storage for subsequent processing.
 25. Thecombination of claim 24, where there is further included means forsensing the edge of a document, said means for sensing being connectedto said means for printing item sequence numbers for causing said meansfor printing to print the sequence number in the margin of thedocuments.
 26. A method of converting graphics to character data,comprising the steps of:producing a graphics image of a document;identifying the graphics image by comparing portions of the imageagainst a series of identifiers for a match; extracting at least onegraphical data area from a selected portion of the graphics image avector distance from the matched portion; converting the graphical dataarea to a character string by processing individual graphical portionsof the data area and converting each portion to a character positionedin the string at a location associated with the location of thegraphical portion from which it is derived; displaying the graphicaldata area along with the character string; and displaying an unconvertedportion of the graphical data area as a universal character intermixedin the character string and located within the character string at aposition associated with its location in the graphical data area. 27.The method of claim 26 wherein there is further included the step ofkeying in a character representative of the unconverted portion of thegraphical data and positioning the character within the character stringat a position associated with the location of the universal character toreplace the universal character.
 28. The method of claim 27, whereinthere is further included the step of displaying the character stringwith the keyed character in place of the character string and theintermixed universal character.
 29. The method of claim 26, wherein thestep of converting each portion of the graphical data area leaves theoriginal graphics image unaltered.
 30. The method of claim 26, whereineach character is an ASCII character.
 31. The method of claim 26,wherein the universal character is an asterisk.
 32. A combination,comprising:means for sequentially scanning a series of documentsassociated together as a transaction and producing a sequential seriesof graphical images of the documents, said means for scanning andproducing for supplying a graphical image of each document; means forgenerating a signal to identify the end of the transaction; and meansfor producing a unique data key responsive to the reception of thesignal for locating each sequential series of graphical imagesrepresenting a transaction.
 33. The combination of claim 32, whereinsaid means for generating a signal is a wand located between each seriesof documents.
 34. The combination of claim 32, wherein said means forgenerating a signal is a unique document placed first in the series ofdocuments associated as a transaction.
 35. The combination of claim 34,wherein said combination includes means for identifying anidentification area in the graphical image of the unique document. 36.The combination of claim 35, wherein said means for identifying,generating a signal to signify that the preceding document was the endof the transaction.
 37. A method of improving the efficiency of entry ofdata from pages included in a plurality of transactions into a dataprocessing system, the paper having intermixed document types, themethod wherein a transaction comprises one or more pages reducing theamount of labor involved with handing of the documents, the methodcomprising steps of:acquiring on a transaction by transaction basis anelectronic image of a page in each transaction without pre-sorting pagesaccording to page type; and automatically comparing with a computer eachelectronic image to predefined document types for identifying the pagetype for facilitating subsequent data extraction form the image of thepage.
 38. The method of claim 37 further comprising the step ofmaintaining automatically a connection between an image of a page andthe transaction of which it is apart.
 39. The method of claim 38 furthercomprising the steps of storing the electronic images and retrieving theelectronic images during an image-based work flow management process.40. The method of claim 37 further comprising the step of deskewing anacquired electronic image to facilitate the step of automaticallycomparing.
 41. The method of claim 37 further comprising the step ofextracting automatically data from the images based on identification ofits page type.
 42. The method of claim 41 further comprising the step ofelectronically sorting identified electronic images according to type.43. The method of claim 41 wherein the step of extracting includes thestep of extracting from at least a portion of an electronic image datausing an automatic character recognition process.
 44. The method ofclaim 41 wherein the step of extracting includes the step manual keyentry of data from the image.
 45. The method of claim 37 wherein thestep of acquiring includes the step of receiving with a facsimilereceiving device images of documents related to a single transactionhaving data to be extracted.
 46. The method of claim 37 wherein the stepof acquiring includes the step of transporting across a scanner paperdocuments that have intermixed formats one transaction at a time. 47.The method of claim 46 wherein the step of comparing includes the stepsof identifying a preselected page type during the step of capturing, thepages so identified being diverted during transport into a pocket forspecial processing.
 48. The method of claim 37 wherein the step ofcomparing of the electronic image for identification of document typeincludes comparing the size of each electronic image for identification.49. The method of claim 37 wherein the step of comparing includes thestep of identifying the page type by automated recognition of apreselected identifying reference area on the document.
 50. The methodof claim 37 wherein the step of comparing includes the step ofidentifying the page type using a Pel pattern technique.
 51. A method ofdata extraction from images of pages having multiple numbers ofdifferent types of data fields comprising the steps of:identifying anelectronic image of a page as having a known format; electronicallycarving form the electronic image a portion thereof based on theidentified known format of the page, the image portion being a graphicalrepresentation of a data field on the document from which data is to beextracted; and distributing electronically the carved image portion tomeans for extracting data from the graphical representation of the datafield.
 52. The method of claim 51 wherein the means for extractingincludes an automated character reader for reading data from the imageportion.
 53. The method of claim 52 wherein the step of distributing thecarved portions includes the step of distributing carved image portionshaving characters in the graphical data field not capable of being readby the automated character reader to a operator correction means fordisplay to and correction by an operator.
 54. The method of claim 51wherein the means for extracting includes operator key entry means. 55.The method of claim 51 wherein:the step of electronically carvingincludes carving from a plurality of electronic images of pages portionsthereof, each portion including a graphical representation of a datafield from which data is to be extracted; the step of distributingincludes distributing each of the plurality of carved image portions toone of a plurality of means for extracting data.
 56. The method of claim55 wherein the step of distributing includes distributing each of aplurality of carved image portions to one of a plurality of means forextracting data based on the graphical data field's type.
 57. The methodof claim 51 wherein the step of carving includes the step of carvingfrom an electronic image of a document multiple portions thereof, eachportion containing graphical representation of a data field from whichdata is to be extracted.
 58. The method of claim 57 wherein the step ofdistributing includes the step of distributing image portions to aplurality of means for extracting data depending on the graphical datafield's type.
 59. The method of claim 57 wherein the means forextracting includes an automated reader adapted for reading a particulartype of graphical data field, and wherein the step of distributingincludes the step of distributing graphical data fields of theparticular type to the automated reader.
 60. The method of claim 59wherein the means for extracting further includes an operator entrydevice for extracting data from an electronic image that is not readableby the automated character reader.
 61. A method of automated entry ofdata relating to a transaction into a data processing system from pageshaving intermixed page types while maintaining transaction integrity andreducing the amount of labor involved with handing of the documents, themethod comprising the steps of:capturing electronic images of pageshaving intermixed page types on a transaction by transaction basis;identifying automatically with a computer a page type of at least one ofthe electronic images of the pages; carving a portion of the identifiedelectronic image containing a graphical data field based on a knownformat of the page type; and distributing carved image portion to meansfor reading data from the graphical data field.
 62. The method of claim61 wherein the step of distributing includes distributing an imageportion to automated character recognition means and to manual entrymeans depending on the graphical data field being distributed.
 63. Themethod of claim 61 further including the step of distributing anidentified electronic image of a page to means for reading data from theelectronic image of the entire page.
 64. The method of claim 61 whereinthe means for reading includes an automated character recognition meansand wherein the method further includes the step of distributing agraphical data field containing at least one character not readable bythe automated character recognition means to means for correcting data.65. The method of claim 61 wherein the means for reading includes anautomated character recognition means and wherein the method furtherincludes the step of distributing a portion of graphical data fieldcontaining at least one character nor readable by the automatedcharacter recognition means to means for manually correcting the atleast one character, the means for manually correcting displaying aplurality of unrelated image portions in a manner to facilitate a rateof entry of unreadable characters.
 66. The method of claim 65 furtherincluding the steps of storing the images and of performing image basedwork flow processing using the stored images.
 67. The method of claim 61further including a step of collecting data read from each page in eachtransaction for storage in data storage.
 68. The method of claim 67wherein the means for reading includes a plurality of readers; andwherein the step of distributing includes distributing each of aplurality of graphical data fields to means for reading without regardfor the page from which the graphical data fields.
 69. The method ofclaim 67 further including the step performing work flow processingusing the collected read data that is retrieved from data storage. 70.The method of claim 61 wherein:the step of carving includes a step ofcarving graphical data fields of certain of the electronic images basedon predefined formats associated with the page type that locate fieldswithin the image from which data is desired to be extracted; and thestep of distributing includes a step of distributing each carvedgraphical data fields to one of a plurality of readers based on a fieldtype.
 71. A method of manually correcting unreadable characters from anautomated character recognition device comprising the stepsof:concurrently displaying a plurality of image portions from at leastone electronic page image, each portion having at least one graphicalcharacter that cannot be read by the automated character recognitionprocess and that is presented in a predetermined relationship to theother portions, the display and relationship of the plurality of imageportions tending to make more efficient the rate of correction ofunreadable characters with a manual key entry means; and providingoperator entry means associated with the electronic display for enteringcharacters that are not readable.
 72. The method of claim 71 wherein theconcurrently displayed image portions are from a plurality of electronicpage images.
 73. The method of claim 71 further comprising the step ofdisplaying on the display near the image portions any data charactersread by the automated character recognition process within the displayedimage portion with an indication of any characters in the image portionthat were not read.
 74. The method of claim 71 further comprising thesteps of carving from an electronic page image of a document pageportion having a graphical data field containing data to be extractedand presenting the carved graphical data field to an automated characterrecognition process as a portion of the electronic page image to beread; and displaying on the manual entry means a predetermined portionsof the graphical data field, at least one of which is not readable bythe automated character recognition process.
 75. The method of claim 71further comprising a steps of:carving from an electronic page image agraphical data field containing data to be extracted; distributing thegraphical data field to an automated character recognition process andreading the graphical data field; and displaying the entire graphicaldata field for correction with the manual key entry means by keying inthe entire data field if at least one of the characters in the graphicaldata field is not readably the automated character recognition process.76. An image capture and data extraction system comprising:an imageacquisition system for electronically capturing page images ofintermixed types on a transaction by transaction basis while maintainingtransaction integrity; data processing system for identifying a pageimage as having a known format, carving from an identified page image aportion of the image containing a data field based on its identifiedformat, extracting character data from the carved image of the datafield, and assembling extracted data into a data record for thetransaction for storage.
 77. The image capture and data extractionsystem of claim 76 wherein the data processing system includes aplurality of computers linked together in a network and includes acapture management subsystem for extracting data from electronic images,storage management subsystem for storing electronic images and anapplication management subsystem for controlling the flow of electronicimages and data between the image acquisition system, the capturemanagement subsystem and the storage management subsystem.
 78. The imagecapture and data extraction system of claim 77 wherein the dataprocessing system further includes data correction and entry workstations.
 79. The image capture and data extraction system of claim 77further including at least one image-based work flow management workstation coupled to the storage management subsystem, the image-basedwork flow management work station retrieving from the storage managementsubsystem data and images.
 80. The image capture and data extractionsystem of claim 76 further comprising an image-based work flowmanagement system operating on the stored data record.
 81. A system foroptically capturing, storing, and retrieving data using electronicimages, which system comprises:(a) means for optically scanningdocuments; (b) means for recording electronic images of documents tofacilitate document identification using identification areas oridentification words; (c) means for extracting data fields from saidelectronic images using automatic character recognition techniques andkey correction; (d) means to record electronic data extracted from saiddata fields; (e) means for transmitting said recorded electronic data toa host computer; and (f) mean for selectively retrieving data asnecessary in performing business transactions;wherein the means foroptically scanning documents includes means to separate checks fromother pages.
 82. A system for optically capturing, storing, andretrieving data using electronic images, which system comprises:(a)means for optically scanning documents; (b) means for recordingelectronic images of documents to facilitate document identificationusing identification areas or identification words; (c) means forextracting data fields from said electronic images using automaticcharacter recognition techniques and key correction; (d) means to recordelectronic data extracted from said data fields; (e) means fortransmitting said recorded electronic data to a host computer; (f) meansfor selectively retrieving data as necessary in performing businesstransactions; and (g) means for indexing and cross-referencing storeddata and electronic images.
 83. A system for optically capturing,storing, and retrieving data using electronic images, which systemcomprises:(a) means for optically scanning documents; (b) means forrecording electronic images of documents to facilitate documentidentification using identification areas or identification words; (c)means for extracting data fields from said electronic images usingautomatic character recognition techniques and key correction; (d) meansto record electronic data extracted from said data fields; (e) means fortransmitting said recorded electronic data to a host computer; and (f)means for selectively retrieving data as necessary in performingbusiness transactions;wherein the means for recording electronic imagesof documents includes means for identifying document formats by pelpattern techniques.
 84. The apparatus of claim 83 further including ascanner sensor means to detect document edge alignment while printingitem sequence numbers.
 85. A system for optically capturing, storing,and retrieving data using electronic images, which system comprises:(a)means for optically scanning documents; (b) means for recordingelectronic images of documents to facilitate document identificationusing identification areas or identification words; (c) means forextracting data fields from said electronic images using automaticcharacter recognition techniques and key correction; (d) means to recordelectronic data extracted from said data fields; (e) means fortransmitting said recorded electronic data to a host computer; and (f)means for selectively retrieving data as necessary in performingbusiness transactions;wherein the means for optically scanning documentsfurther includes means for printing item sequence members upon saiddocuments as said documents pass through said optical scanning means topermit a contiguous document filling and retrieval system.
 86. A systemfor optically capturing, storing, and retrieving data using electronicimages, which system comprises:(a) means for optically scanningdocuments; (b) means for recording electronic images of documents tofacilitate document identification using identification areas oridentification words; (c) means for extracting data fields from saidelectronic images using automatic character recognition techniques andkey correction; (d) means to record electronic data extracted from saiddata fields; (e) means for transmitting said recorded electronic data toa hostcomputer; and (f) means for selectively retrieving data asnecessary in performing business transactions;wherein the means foroptically scanning documents includes means for correcting document skewresulting from document misalignment during scanning.
 87. A system foroptically capturing, storing, and retrieving data using electronicimages, which system comprises:(a) means for optically scanningdocuments; (b) means for recording electronic images of documents tofacilitate document identification using identification areas oridentification words; (c) means for extracting data fields from saidelectronic images using automatic character recognition techniques andkey correction; (d) means to record electronic data extracted from saiddata fields; (e) means for transmitting said recorded electronic data toa host computer; and (f) means for selectively retrieving data asnecessary in performing business transactions;wherein the means forextracting data fields further includes means for producing whethermachine printed or pen printed data resides in said data field.
 88. Asystem for optically capturing, storing, and retrieving data usingelectronic images, which system comprises:(a) means for opticallyscanning documents; (b) means for recording electronic images ofdocuments to facilitate document identification using identificationareas or identification words; (c) means for extracting data fields fromsaid electronic images using automatic character recognition techniquesand key correction; (d) means to record electronic data extracted fromsaid data fields; (e) means for transmitting said recorded electronicdata to a host computer; and (f) means for selectively retrieving dataas necessary in performing business transactions;wherein the means forextracting data fields further includes means for circulating orre-routing data in a logical error reduction sequence to reduce keyingerrors.
 89. A system for optically capturing, storing, and retrievingdata using electronic images, which system comprises:(a) means foroptically scanning documents; (b) means for recording electronic imagesof documents to facilitate document identification using identificationareas or identification words; (c) means for extracting data fields fromsaid electronic images using automatic character recognition techniquesand key correction (d) means to record electronic data extracted fromsaid data fields; (e) means for transmitting said recorded electronicdata to a host computer; (f) means for selectively retrieving data asnecessary in performing business transactions; and (g) means forrecording statistical summaries of data key operator errors forevaluation of operator performance.
 90. A system for opticallycapturing, storing, and retrieving data using electronic images, whichsystem comprises:(a) means for optically scanning documents; (b) meansfor recording electronic images of documents to facilitate documentidentification using identification areas or identification words; (c)means for extracting data fields from said electronic images usingautomatic character recognition techniques and key correction; (d) meansto record electronic data extracted from said data fields; (e) means fortransmitting said recorded electronic data to a host computer; and (f)means for selectively retrieving data as necessary in performingbusiness transactions;wherein the means for optically scanning documentsincludes a wand and a related switch mechanism associated with thedocument feeder to permit the definition of a transaction boundary. 91.A system for optically capturing, storing, and retrieving data usingelectronic images, which system comprises:(a) means for opticallyscanning documents; (b) means for recording electronic images ofdocuments to facilitate document identification using identificationareas or identification words; (c) means for extracting data fields fromsaid electronic images using automatic character recognition techniquesand key correction; (d) means to record electronic data extracted fromsaid data fields; (e) means for transmitting said recorded electronicdata to a host computer; and (f) means for selectively retrieving dataas necessary in performing business transactions;wherein the means foroptically scanning documents further includes means for separatingchecks from other documents.
 92. A system for optically capturing,storing, and retrieving data using electronic images, which systemcomprises:(a) means for optically scanning documents; (b) means forrecording electronic images of documents to facilitate documentidentification using identification areas or identification words; (c)means for extracting data fields from said electronic images usingautomatic character recognition techniques and key correction; (d) meansto record electronic data extracted from said data fields; (e) means fortransmitting said recorded electronic data to a host computer; and (f)means for selectively retrieving data as necessary in performingbusiness transactions;wherein the means for optically scanning documentsfurther includes means for maintaining transaction integrity in theprocessing of said business transactions.